Reducing computational load in segmental hidden Markov model decoding for speech recognition
نویسنده
چکیده
Introduction: Research into segment models (SMs) for automatic speech recognition is motivated by limitations of conventional hidden Markov models (HMMs). While HMMs associate states with individual feature vectors, SMs associate states with sequences of vectors (segments) [1], or variable duration acoustic features [2], thereby allowing important static and dynamic structure to be modelled. Glass [2] reports state-of-the-art phone recognition on TIMIT [3] using an SM. Segmental HMMs (SHMMs) can outperform comparable HMMs [4], but computational load increases. In standard notation, the basic step in HMM Viterbi decoding for a sequence of vectors y1, . . . , yt, . . . , yT is: atðiÞ 1⁄4 max j at 1ð jÞajibiðytÞ ð1Þ
منابع مشابه
Performing Speech Recognition on Multiple Parallel Files Using Continous Hidden Markov Models on an FPGA
Speech recognitioii is a cornpntationally demanding task, particularly the stages which use Viterbi decoding for coiiwrfifig pre-processed speech data into words or subword mit, and the associated observation probability calciilatioris. which employ nzulrivariate Gaussian disrribufions: so any device that can reduce the load on, for example. a PC’s processor, is advantageous. Hence we preseiir ...
متن کاملTelephone Speech Recognition via the Combination of Knowledge Sources in a Segmental Speech Model
The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by many in the literature, and alternative solutions have been proposed. One such alternative is segme...
متن کاملGinisupport vector machines for segmental minimum Bayes risk decoding of continuous speech
We describe the use of Support Vector Machines (SVMs) for continuous speech recognition by incorporating them in Segmental Minimum Bayes Risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pos...
متن کاملEfficient Methods for Automatic Speech Recognition
This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks. The...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کامل